AITopics | coreset size

Collaborating Authors

coreset size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers Ziyi Fang Nanjing University Lingxiao Huang Nanjing University Runkai Yang Nanjing University

Neural Information Processing SystemsJun-14-2026, 11:53:44 GMT

We study the robust geometric median problem in Euclidean space Rd, with a focus on coreset construction. A coreset is a compact summary of a dataset P of size n that approximates the robust cost for all centers c within a multiplicative error ε.

data mining, dist, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia > China > Jiangsu Province > Nanjing (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement

Neural Information Processing SystemsApr-24-2026, 07:16:37 GMT

Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.

artificial intelligence, machine learning, theorem 4, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.34)
Construction & Engineering (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

90fd4f88f588ae64038134f1eeaa023f-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 22:01:04 GMT

algorithm, coreset, dist, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.46)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

90fd4f88f588ae64038134f1eeaa023f-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 22:01:00 GMT

Previous coreset constructions only allow one missing coordinate.

artificial intelligence, coreset, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

ab452534c5ce28c4fbb0e102d4a4fb2e-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 18:41:19 GMT

construction, experiment, gradient, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

RobustandFully-DynamicCoresetfor Continuous-and-BoundedLearning(WithOutliers) Problems

Neural Information Processing SystemsFeb-9-2026, 10:43:43 GMT

Moreover, our robust coreset can be efficiently maintained in fullydynamic environment. To the best of our knowledge, this is the first robust and fully-dynamic coreset construction method for these optimization problems.

artificial intelligence, machine learning, outlier, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

03287fcce194dbd958c2ec5b33705912-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 07:56:58 GMT

coreset, glse, panel data, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Ohio (0.04)
(4 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

Fang, Ziyi, Huang, Lingxiao, Yang, Runkai

arXiv.org Machine LearningOct-29-2025

We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tildeΘ(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.

data mining, dist, machine learning, (19 more...)

arXiv.org Machine Learning

2510.24621

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
(20 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

The Impact of Coreset Selection on Spurious Correlations and Group Robustness

Dharmasiri, Amaya, Yang, William, Kirichenko, Polina, Liu, Lydia, Russakovsky, Olga

arXiv.org Artificial IntelligenceOct-22-2025

Coreset selection methods have shown promise in reducing the training data size while maintaining model performance for data-efficient machine learning. However, as many datasets suffer from biases that cause models to learn spurious correlations instead of causal features, it is important to understand whether and how dataset reduction methods may perpetuate, amplify, or mitigate these biases. In this work, we conduct the first comprehensive analysis of the implications of data selection on the spurious bias levels of the selected coresets and the robustness of downstream models trained on them. We use an extensive experimental setting spanning ten different spurious correlations benchmarks, five score metrics to characterize sample importance/ difficulty, and five data selection policies across a broad range of coreset sizes. Thereby, we unravel a series of nontrivial nuances in interactions between sample difficulty and bias alignment, as well as dataset bias and resultant model robustness. For example, we find that selecting coresets using embedding-based sample characterization scores runs a comparatively lower risk of inadvertently exacerbating bias than selecting using characterizations based on learning dynamics. Most importantly, our analysis reveals that although some coreset selection methods could achieve lower bias levels by prioritizing difficult samples, they do not reliably guarantee downstream robustness.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.1169

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

7f975a56c761db6506eca0b37ce6ec87-Reviews.html

Neural Information Processing SystemsOct-9-2025, 15:07:57 GMT

"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1011" "Title:","Distributed k-means and k-median clustering on general communication topologies" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper provides provably efficient algorithms for performing k-means and k-median clustering in the distributed setting. The main focus of the paper is minimizing communication cost in the distributed network. Although, i am not very much aware of the literature, the paper seems to provide a very novel idea of distributed coresets that leads to clustering algorithms which provably improves the state of the art communication complexity significantly. Existing approaches only use the idea of approximating coresets by taking the union of local coresets.

algorithm, communication cost, coreset, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.25)

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback